Selection of Best Outlier Detection Method Using Regression Analysis
نویسندگان
چکیده
Outliers are unusual data values that are inconsistent with most of the records. Such non-representative records can seriously affect the model to be produced, so detecting outlier is a significant job to achieve higher accuracy. Several outlier detection methods are used in literature for real as well as simulated data sets. The aim of this study is to compare the two outlier detection method i.e. Cook’s Distance and Mahalnobis method with standard method for outlier detection. The 15 replicates for simple linear regression of total household expenditure and household size are used to find the most efficient outlier detection method. It is found that the standard method of outlier detection produce less residual to predict the actual values as compared to the other two methods. While among the other two methods Cook’s method produce less prediction error as compared to the Mahalanobis methods. Keywords-Outlier; Outlier Detection; Regression; Analysis; Data; Applications
منابع مشابه
Outlier Detection by Boosting Regression Trees
A procedure for detecting outliers in regression problems is proposed. It is based on information provided by boosting regression trees. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate after removing it. The selection criterion is based on Tchebychev’s inequality applied to the maximum over the boosting iterations of ...
متن کاملA statistical test for outlier identification in data envelopment analysis
In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the prese...
متن کاملAnalysis of a Problem Using Various Visions
In this paper an applied problem, where the response of interest is the number of success in a specific experiment, is considered and by various visions is studied. The effects of outlier values of response on results of a regression analysis are so important to be studied. For this reason, using diagnostic methods, outlier response values are recognized. It is shown that use of arc-sine ...
متن کاملPredicting software project effort: A grey relational analysis based method
The inherent uncertainty of the software development process presents particular challenges for software effort prediction. We need to systematically address missing data values, outlier detection, feature subset selection and the continuous evolution of predictions as the project unfolds, and all of this in the context of data-starvation and noisy data. However, in this paper, we particularly ...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کامل